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Abstract 

We introduce a very fast and reliable algorithm for shape similarity retrieval in large image 
databases which is robust with respect to noise, scale and orientation changes of the objects. 
The Curvature Scale Space image is used to represent the shapes of object boundary contours. 
Since the algorithm uses the global information of boundaries of objects, it is sensitive to 
major occlusion, but some minor occlusion can be detected by the algorithm. 

We tested our method on a database of 450 images of marine animals with a vast variety 
of shapes, with very good results which we present in this paper. 

1 Introduction 

Most proposed content based database systems aim to retrieve a small set of candidate images 
which include the desired image. The successful retrieval of the best candidate then relies 
on the final user judgement. In (1], the authors have used Polygonal approximation, while 
a set of features like boundary /perimeter, elongation (major axis/ minor axis), number of 
holes, etc, have been used in [2] for shape similarity retrieval. The authors in [3] have used 
a combination of heuristic shape features such as area, circularity, eccentricity, major axis 
orientation and a set of algebraic moment invariants. They have also used other features such 
as color, texture, and even sketch features. 

We use a modified version of Curvature Scale Space image matching [6] for comparing 
shapes of objects in an image database. Our prototype database includes more than 450 
colored images of marine animals, with every image containing one animal. The preprocessing 
step (consisting of gray-level morphology, thresholding and binary morphology ) extracts the 
boundaries of objects. Other techniques such as active contours [4] can also be incorporated 
at this stage if necessary. 

We compute the CSS image of every boundary and then find the maxima of CSS contours 
which are used as a shape descriptor to compare objects. The coordinates of these points 
together with the name of the original image constitute a record which represents the object. 

To retrieve similar images from the database, the user can either input an image and ask 
the system to find all images similar to it or sketch a boundary of his/her desired object using 
a painting package such as xpaint. The system computes the CSS image of the input and 
finds its maxima, and after comparison, assigns a matching value to every image candidate 
in the database which is similar to the input and shows the first n matched images with best 
values where n is determined by the user. 

During query processing we use our fast algorithm to compare the input and the candidate 
representations and assign a matching value to every candidate. 

'The authors are with the Department of Electronic and Electrical Engineering at the 
University of Surrey, Guildford 
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The Curvature Scale Space Representation 

The curvature of a curve is defined as the derivative of the tangent angle to the curve. 
Consider a parametric vector equation for a curve: 

?(u) = (x(u),y(u)) 

where u is an "arbitrary parameter. The formula for computing the curvature function can be 
expressed as[5): 

, , _ x(u)y{u) - x{u)y(u) 
* {U) ~ (x2{u)-Hy 2 (u)) 3 /2 • W 
If T is a closed planar curve, u can be the normalized arc length parameter which means: 

r-{(x(ti) lV («))|« €[o,i]} 

and the denominator in equation (1) will then be equal to one and we obtain: 

«(ti) = ±{u)y(u) - x(u)y(u) . 

In the rest of this paper we will assume that the curves are clos ed and plan ar and are 
expressed in terms of the normalized arc length. 

Curvature zero crossings of a curve are points with zero curvature. On smooth curves, 
wherever the sign of curvature changes, there is a point with zero curvature. 
There are several approaches in calculating the curvature of a digital curve [7]. We use the 
idea of curve evolution which studies shape properties while deforming in time. A certain 
kind of evolution can be achieved by Gaussian smoothing to compute curvature at varying 
levels of detail. If g{u,a) is a 1-D Gaussian kernel of width <r, then X{u,a) and Y{u,o) 
represent the components of evolved curve, 

X{u, o) = x{u) * a) V(u, a) = y(u) * g{u, a) 

where * is the convolution operator. According to the properties of convolution, the deriva- 
tives of every component can be calculated easily : 

X u (u,o) ~ x(u) *ff tl (u,ff) X uv (u,cr) = x(u) *g uu {u,a) 

and we will have a similar formula for Y u (u, a) and Y uu (u, a). Since the exact forms of ff u («, a) 
and p tt u(uiff) arc known, the curvature of an evolved digital curve can be computed easily. 

. X u {u,o)Y uu {u,o) - Xuu{u,a)Y u (u,cr) 

" (U ' ff) = {*„(...„)' + Y.(«.«0*p/» (2) 

As a increases, the shape of T c changes. This process of generating ordered sequences of 
curves is referred to as the evolution of I\ 

2 Curvature Scale Space Matching 

We assume that the user enters his query by sketching a boundary of his desired image or 
by pointing to an image and wants the system to retrieve all images like that one from the 
database. In each case, we do the same preprocessing to find the maxima of CSS contours 
of input image and compare them with the same descriptors of the database images. For 
convenience, from now on, we call the input as image and the images in the database as 
models. In this section we explain the algorithm of matching which is rather different from 
what is proposed in [6]. 
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Algorithm 



After extracting maxima of every model, we normalize their location so that the horizontal 
coordinate u varies in the range [0, 1). This will ensure that the comparison is meaningful even 
if the number of samples in image and model are different. The maxima of every model are 
sorted according to their a-coordinate (filter width) during the process of maxima extraction. 

The matching algorithm which compares the two sets of maxima, one from the image and 
the other from the model is as follows. 

1. Create a node consisting of the largest scale maximum of the image and the largest scale 
maximum of the model. If there are more than one maxima in the model which have a 
a-coordinate close (within SO percent) to the largest scale maximum of the image, create 
extra nodes consisting of the largest scale maximum of the image and that respective 
additional maxima of the model. 

Initialize the cost of each node to the absolute difference of a- coordinates of the image 
and the model. 

Compute a CSS shift parameter a for each node : 

a = U m - Ui 

where U is the horizontal coordinate of a maximum, and i and m refer to image and 
model respectively. If the model and the image are the same and we shift the CSS of 
image horizontally by a, the two CSS should cover each other. 

2. Create two lists for each node obtained in step 1. The first list will contain the model 
curve maxima and the second list will contain the model curve maxima matched within 
that node at any point of the matching procedure. Initialize the first and second list of 
each node by the corresponding maxima determined in step 1. 

3. Expand each node created in step 1 using the procedure described in step 4. 

4. To expand a node, select the largest scale image curve CSS maximum (which is not in 
the first list) and apply that node's shift parameter computed in step 1 to map that 
maximum to the model CSS image. Locate the nearest model curve CSS maximum 
( which is not in the second list ). If the two maxima are in a reasonable horizontal 
distance ( 0.2 of the maximum possible distance ), define the cost of the match as the 
straight line distance between the two maxima. Otherwise, define the height of the 
image curve CSS maximum as the cost of the match. If there are no more image curve 
CSS maxima left, define the cost of match as the height of the highest model curve 
CSS maximum not in the node's second list. Likewise, if there are no more model curve 
CSS maxima left, define the cost of match as the height of the selected image curve 
maximum. Add the match cost to the node cost. Update the two lists associated with 
the node. 

5. Select the lowest cost node. If there are no more model or image curve CSS maxima 
that remain unmatched within that node, then return that node as the lowest cost 
node. Otherwise, go to step 4 and expand the lowest cost node. 

6. Reverse the place of the image and the model and repeat steps 1 to 5 to find the lowest 
cost node in this case. 

7. Consider the lowest node as the final matching cost between the image and the model. 
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3 Results and discussion 



We tested the proposed method on a database of 450 images of marine animals. Each image 
consisted of just one object on a uniform background. The system software was developed 
using the C language under Unix operating system. The response rate of the system was less 
than one second for every user query. 

After extracting the contour of the object in each image, it was sampled at 200 equal 
distance points. The normalized coordinates of these points were then used to find the CSS 
image of the contour . This procedure was followed by the extraction the maxima of every 
CSS image.The coordinates of these points were stored in a record together with the number 
of rows and columns of the CSS image and the name of the original image in a file which 
was read at the beginning of every query processing. We discuss the experimental results by 
representing the response of the system to some queries. In the first one the user has shown 
an image which really existed in the database and asked for similar images. The input image 
is in figure 2a and the output of the system is in figure 3a . The first answer of the system is 
identical with the input image, with a zero match value. For the remaining images, the match 
value varies between 0.29 to 0.38 . When the match value is more than 0.45, the similarity 
between input and output images is poor. Note that the fourth and fifth output images are 
different just in scale. 

In the second query, user has used mouse to paint an outline of his desired image. Figure 
2b is the input which is apparently noisy, and figure 3b consists of retrieved images. As this 
example shows, the system is robust with respect to noise, because the location of maxima on 
CSS is identified after a process of smoothing and this process for the existing system begins 
with a Gaussian filter with o — 1. Although this property is beneficial in most cases like this 
example, sometimes, eg when there are some ripples on the contour of the original image, 
it may remove some useful information and even some maxima of CSS image. In general, 
there must be a limit on the initial value of a so that the noise is removed and the useful 
information is not lost. 

Other examples are presented in figures 3c and 3d, where the inputs and the first outputs 
of the system are identical and other outputs are similar to the inputs. These examples also 
show the variety of shapes of objects in our database. 

4 Conclusions 

This paper described a method to retrieve similar images from large image databases using 
their shape properties. A database of 450 images of marine animals was selected to test 
the method. The boundary of every object was extracted and the curvature scale space 
image of this boundary was computed. Then the maxima of this image were extracted to 
be considered as the shape descriptors. A fast and reliable algorithm for comparing these 
descriptors together with the results of several queries were presented. 
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